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[57] ABSTRACT 

A system and method provides universal access to voice- 
based documents containing information formatted using 
MIME and HTML standards using customized extensions 
for voice information access and navigation. These voice 
documents are linked using HTML hypcr-links that are 
accessible to subscribers using voice commands, touch-tone 
inputs and other selection means. These voice documents 
and components in mem are addressable using HTML 
anchors embedding HTML universal resource locators 
(URLs) rendering them universally accessible over the Inter- 
net. This collection of connected documents forms a voice 
web. The voice web includes subscriber-specific documents 
including speech training files for speaker dependent speech 
recognition, voice print files for authenticating the identity 
of a user and personal preference and attribute files for 
customizing other aspects of the system in accordance with 
a specific subscriber. 
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SYSTEM AND METHOD FOR PROVIDING 
AND USING UNIVERSALLY ACCESSIBLE 
VOICE AND SPEECH DATA FILES 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

This invention relates generally to the construction and 
use of distributed interactive voice and speech processing 
systems, including interactive voice response (IVR) systems 
and voice messaging (VM) systems. More particularly, the 
invention relates to form based publishing of voice infor- 
mation and the use of universally accessible personal pro- 
files for authentication of the user by voice signatures and 
generating context sensitive active vocabularies to improve 
speaker dependent speech recognition. The invention also 
relates to the use of the user attributes and preferences stored 
in universally accessible persona] profiles to improve the 
efficiency of navigation and search as well as efficacy of 
search results pertaining to user queries. 

2. Description of the Related Art 

Conventional interactive voice response (IVR) systems 
allow a user to place a telephone call into a system, navigate 
(generally using touch tone input) through a hierarchy of 
options in response to voice prompts and retrieve informa- 
tion stored in a computer database. Airlines, banks, credit 
companies and many other service organizations are just a 
few examples of the types of businesses using IVR systems 
to allow a customer (or prospective customer) to retrieve 
desired information. These conventional systems are gener- 
ally organization-specific in that they offer access to a single 
database or set of databases related to the goods, services or 
other aspects of the organization maintaining the IVR sys- 
tem. Thus, conventional IVR technology is used to offer 
access lo information specific to a single organization (i.e. a 
specific airline, bank or credit company). For example 
airlines typically use IVR to allow callers to access flight 
arrival and departure information or to select reservation 
options, for the particular airline only. 

It is desirable to provide an IVR system that enables 
access to an aggregation of databases and services rather 
than a single database and service. One barrier to the 
provision of aggregated services in an IVR system is that 
conventional IVR systems do not have a distributed infor- 
mation publishing means. Conventional IVR systems do not 
have a mechanism for service/information providers to 
readily access the IVR system and add updated or entirely 
new information for publication on the IVR system. 

Further, conventional IVR systems are generally config- 
ured for uniform access by any caller admitted to the IVR 
system. Each caller is handled by the system in the same 
manner and offered an identical set of options. One reason 
that IVR systems use uniform user interfaces for each caller 
rather than caller-specific configurations is that conventional 
IVR systems operate in "closed" computer environments 
hosting the particular IVR system. Thus, when a caller 
accesses a conventional IVR system, the only caller-specific 
information which the system has at its disposal, is any 
information previously provided by the caller which the 
system has maintained or any information that is provided 
by the caller during the IVR session (i.e. when a user enters 
an account number using touch tone telephone input). 
Because, however, collecting and storing caller-specific 
information with conventional technology is cumbersome 
and time consuming, most IVR systems do not offer caller- 
specific (caller customized) features. 

There are numerous applications in which it is desirable 
for an IVR system to use caller-specific information in 
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handling a call. Caller-specific information in the form of 
user preferences can aid in minimizing the size of a com- 
mand tree which the user must navigate to access desired 
information. Additionally, caller specific information could 

5 also be used to authenticate the identity of a user in cases 
where security is an issue (i.e. in bank and credit contexts). 
Further, caller-specific speech training profiles could be used 
to implement speaker dependent speech recognition to allow 
for a caller to use voice commands in place of touch-tone 

10 commands. Still further, an IVR system having access to 
caller-specific data could be used to apply IVR technology 
in new application areas such as personal productivity. 

Thus, there is a need for an improved voice and speech 
processing system that provides universal access to caller- 

15 specific information to provide user-customized IVR sys- 
tems. Further, there is a need to provide universal access to 
voice and speech files in order to allow widespread use of 
such files for caller authentication and for performing 
speaker dependent speech recognition in IVR systems. 

20 SUMMARY OF THE INVENTION 

The system and method of the present invention extends 
World Wide Web (referred to herein as "www" or the "web") 
and Internet technology to provide universally accessible 

25 caller-specific profiles that are accessed by one or more IVR 
systems. The invention features a set of web pages contain- 
ing information (components) formatted using MIME and 
hypertext markup language (HTML) standards with exten- 
sions for voice information access and navigation. These 

3Q web pages are linked using HTML hyper-links that are 
accessible to users via voice commands and touch-tone 
inputs. These web pages and components in them are 
addressable using HTML anchors and links embedding 
HTML universal (uniform) resource locators (URLS) ren- 

35 dering them universally accessible over the Internet. This 
collection of connected web pages are referred to herein as 
the "voice web" and the individual pages are referred to 
herein as "voice web pages". Each web page in the voice 
web contains a specially tagged set of key words and touch 

4 Q tone sequences that are associated with embedded anchors 
and links used for navigation within the web. 

In addition, the invention features a set of linked HTML 
pages representing the user's "personal profile". The per- 
sonal profile contains user's attributes and preferences. 

45 Attributes include user's name, address, phone number, 
personal identification code, voice imprints for 
authentication, speech training profile and other informa- 
tion. Preferences include, configuration preferences such as 
personal greetings and gender and language selection, selec- 

50 tion preferences such as bookmarks and favorite places and 
presentation preferences such as priority ordering, default 
overrides and preferred vocabulary. 

The personal profile is designed for component access 
within web pages allowing easy extraction of context sen- 

55 sitive profile information. In particular, speech training 
profiles (included as a user attribute and which contain word 
patterns representing speaker dependent training 
information) partitioned into sets of related words likely to 
occur in combination within corresponding voice web 

60 pages. A set of command and control words such as "play, 
pause, continue, previous, next, home, reload, help, etc." are 
stored in a top level component set enabling user dependent 
but context independent navigation and control. Other com- 
ponent sets are designed to match the key word sets in 

65 corresponding voice web pages such as a calendar page or 
an address book page enabling user and context dependent 
navigation and control. 
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When a user calls into the distributed voice and speech FIG. 2B is a functional block diagram of an exemplary 

processing system associated with the voice web, the syslem calendar service. 

first identifies the user utilizing a unique account number p IG 2 C is a functional block diagram of an alternative 

(such as phone number or social security number). Next, it configuration 0 f a voice web system in accordance with the 

accesses the user's personal profile using the corresponding 5 prcscnt invention. 

URL and retrieves the user attributes and preferences related v , „ , 

to authentication and security. Using this personal profile FIG - 3 Pirates Phonal voice web used to pro^de 

information, the voice web system authenticates the identity personal services using the system shown in FIG. 2A. 

of the user using a combination of personal identification FIG. 4 illustrates a hierarchy of speech training pages that 

code based password checking and voice imprint matching. w correspond to the service pages shown in FIG. 3. 

The voice imprint is any sufficiently long utterance or phrase pjQ 5 ^strates a hierarchy of attributes and preferences 

that the user has previously entered into his/her profile Each ^ correspond to me service pages shown in FIG. 3. 

user s voice imprint is analyzed and stored in the profile for , . _ _ ... 

quick matching on demand with a real-time provided user FIG - 6 15 a flow ^SfT of a r subscriber authentication 

sample. The combination of every individual's unique vocal mcll ? od ™* ,n ^^J™* of me P ersonal V0ice web 

characteristics stored in the voice imprint coupled with the 15 services shown in FIG. 3. 

random choice of the password phrase ensures a high degree FIG. 7 is a flow diagram of an enhanced speech recog- 

of security and authentication. nition processes used in personal voice web systems shown 

Once authenticated, the user is allowed to navigate and in FIG. 3. 

access more information from the voice web using voice FIG 8 ^ a flow diagram of a query customization process 

commands. In order to effectively accomplish this task, the 20 m accor d ance with the present invention. 

voice web system retrieves the context independent com- „_ _ . „ .. c ... , . , . 

j.f . u FIG. 9 is a flow diagram of a voice publishing method in 

mand and control key word set from the user s speech . ... 6 . . 

p ron j c accordance with the present invention. 

The voice web system then presents a top level voice web FIG- 10 is a system diagram of a business-yellow-order 

personal home page for user's perusal. At the same time, it 25 P a § e system in accordance with the present invention, 
retrieves the set of word recognition patterns associated with 

the key words in the presented page from the user's speech ^^7™^™ 
profile. Thus, the system is able to match the active vocabu- EMBODIMENT 
lary and associated speaker dependent word patterns The figures depict a preferred embodiment of the present 
dynamically in a context sensitive manner. The process 30 invention for purposes of illustration only. One skilled in the 
continues as the user navigates from page to page. The voice art ^ n readily recognize from the following discussion that 
web system dynamically retrieves the suitable subset of alternative embodiments of the structures and methods illus- 
trating word patterns from the user's speech profile match- tratcd hcrcin may bc em ploy C d without departing from the 
ing the voice navigation key words in the page being principles of the invention described herein, 
presented to the user. is 

The process described above greatly reduces the size of System Description 

the training information that needs to be retrieved at any ___ . , c • u 

, ' FIG. 1 is a functional block diagram of a voice web 

time while significantly enhancing accuracy of speech rec- ^ A a lum -" u « * & 

. . , , , f • • si c- *u system 100 in accordance with the present invention. Voice 

ognition using speaker dependent training profiles, Since the J , . j u 

6 . fi f . ♦ . j uth/it j , n web system 100 extends the conventional internet and world 

speech profile is constructed using HTML pages and 40 J . , u . „ \ . , . j L 

r . . . ,, tmi Tt.- wide web ( web or www) technology to voice and speech 

components, it is universally accessible using its URL. This . v ,. . \ , °; . K 

enables the user to call into any compatible Internet con- Processing applications and also enables new uses for inter- 

nectcd voice web system in user's proximity from anywhere ™ cc rcs P onsc ^ technology. Vbicc web system 

in the world, identify himself/herself to the system and then 100 includes one ° r more TOK *** SIt k es f 102 l ° one 

enable the system to dynamically retrieve suitable informs- 45 or ™« w f S?tcways ™ 5 v,a Xh * l I* cmc 10 fi j' VmCC 

..... . . - • . t web sites 102 and voice web gateways 105 transfer files over 

tion that enhances his/her navigation and access of the ? * 

. r „ ,. . , ■ (U ■ „„. • „ ... Internet 101 in accordance with hypertext transport protocol 

information stored in the voice web using voice commands A , . . 

and in ut (HTTP). A subscriber 107 accesses the voice web syslem 

, , ' . . . , . . . 100 by coupling to the gateway 105 using a telephone 111 

In addition to the user attribute information discussed £ (o ^ ^ £ mh J ^te™ ml^k (PSTN) 

above, the personal profile contains user preferences relative 50 
to configuration, presentation and information selection. 

These preferences arc components within the personal pro- Internet 101 is a system of linked communications net- 
file pages and are easily available to the voice web system works thal facilitate communication among computers 
for dynamic retrieval. For example, if the user requests which « ^P 1 ^ 10 n*™ 1 101 ' Generally, internets such 
his/her stock portfolio from the voice web, it first retrieves ss as Internet 101 facilitate communication by provtding file 
the user's preferred portfolio of companies from his/her electronic mail and news group services. Internet 
profile and applies this list to limit the search on stock quotes 101 is preferably the Internet which evolved from the 
from all companies. The user gets exactly the information ARPANET and which is publicly accessible world wide It 
relevant to his/her interest in exactly the order of priority should bc understood however, that the principles of the 
he/she prefers 60 present invention apply to other internets and even closed 

(private) networks such as corporate intranets. 

BRIEF DESCRIPTION OF THE DRAWINGS h SQOuld bc nQtcd ^ syslcm m may mdudc numcrous 

FIG. 1 is a functional block diagram of a voice web voice web sites 102 and numerous voice web gateways 105. 

system in accordance with the prcscnt invention, A single voice web site 102 and a single voice web gateway 

FIG. 2 A is a functional block diagram of the voice web 65 105 are shown in FIG. 1, however, lo keep the figure 

system shown in FIG. 1 configured to provide voice web uncluttered. Thus, voice web system 100 is a collection of 

services. voice web gateways 105 and voice web sites 102 connected 
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over internet 101 enabling subscribers 107 id access voice 
web pages 103 via their telephones as shown in FIG. 1. 

A voice web page 103 is web page specified using a 
navigable markup language that includes voice extensions. 
A navigable markup language is an enhanced type of 
markup language thai facilitates publication navigation and 
access of information stored in documents specified in the 
navigable markup language. An exemplary markup lan- 
guage is the Hypertext Markup Language 2.0, RFC1866, 
HTML working group of Internet Engineering Task Force, 
Sep. 22, 1995, edited by D. Connolly published on the www 
at the following uniform resource locator (URL) address: 
http://w3.org/pub/www/Markup/html-spec. 

A markup language is a language that includes a set of 
conventions for marking portions of a document so that, 
when accessed by a parsing program such as a web browser, 
each marked portion is presented to a user with a distinctive 
format. In contrast to formatting codes used by word pro- 
cessing programs, markup language codes, called tags, do 
not specify exactly how the tagged portion should be pre- 
sented. Instead the tags inform the web browser (parser) that 
the information is in a certain portion of a document such as 
title, heading, form or text and the like. The web browser 
(parser) determines how to present the tagged information. 

A navigable markup language is an enhanced markup 
language that uses tags that are anchors and that are links. 
When these link and anchor lags are invoked, a user is then 
presented another navigable markup language document in 
accordance with the link and anchor tags. This link is 
sometimes called a hyperlink. A hyperlink is a reference to 
another markup language document which when invoked 
facilitates access of the referenced markup language docu- 
ment. 

A navigable markup language thus uses attributes, tags 
and values that enable (i) a publisher to specify the presen- 
tation of information to a user; (ii) a user to interactively 
access the stored information; and (iii) a user to access other 
navigable markup language documents using hyperlinks. 

The navigable markup language used to specify voice 
web pages 103 is Hyper Voice Markup Language (HVML). 
H VML is a version of HTML that includes voice extensions 
as described in Appendix A, incorporated herein by refer- 
ence. Voice web pages 103 include HVML tags and 
attributes that extend HTML to facilitate publication, navi- 
gation and access to voice information. For example, HVML 
specifies functions and protocols that facilitate voice and 
speech processing including voice authentication, speaker 
dependent speech recognition, voice information publishing 
(e.g. creating a voice form) and voice navigation. 

Just as conventional web documents are displayed for the 
user , voice web documents 103 are "played" to a subscriber 
over a telephone. A voice web page 103 is played (by voice 
web browser 106) by sequentially presenting the embedded 
voice components according to the HVML and MIME 
specifications. 

While a conventional web site enables on-demand access 
over an internet to conventional web pages, voice web site 

102 enables on demand access to voice web pages 103. 
Voice web site 102 is a computer that hosts voice web pages 

103 and serves them up to other computers (i.e. voice web 
gateway 105). More specifically, voice web server 102 is a 
computer configured with conventional web server software 
112 and which has access to stored voice web pages 103. A 
voice web site 104 additionally optionally includes a sub- 
scriber directory 104 that stores a list of registered system 
subscribers. Voice web site 102 stores, serves and manages 
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voice web pages 103 and can execute associated external 
scripts or programs in accordance with the present inven- 
tion. These external scripts and programs interface with 
databases and other information sources both internal and 
5 external to web site 102. 

Voice web gateway 105 is a computer connected to the 
internet 101. Voice web gateway 105 also includes a con- 
ventional voice telecommunications interface 114 for cou- 
pling to the public switched telephone network (PSTN) 109 
10 for telephonic communications with a subscriber 107. Tele- 
phone 111 is any voice enabling telecommunications device. 
Exemplary telephones include conventional desktop 
telephones, portable telephones, cellular telephones, analog 
telephones, digital telephones, smart phones and a computer 
configured to operate as a telephone and perform telephonic 
functions. Thus voice web pages 103 are universally acces- 
sible from any ordinary telephone 111. Alternatively, a 
subscriber 107 may access voice web pages 103 cither by 
using a subscriber interface local to voice web gateway 105 
(i.e. a direct user interface with voice web gateway 105) or 
by dialing into voice web gateway 105 using another com- 
puter such as a personal digital assistant or a smart phone. 

Voice telecommunications interface 114 serves as an 
interface between a voice web browser 106 and telephone 
111 and preferably includes conventional telephony and 
voice processing hardware and software enabling voice web 
gateway 105 to receive and answer telephone calls, respond 
to touch tone and voice commands, route and conference 
calls, play voice prompts and record voice messages. 

Voice web gateway 105 additionally hosts a voice web 
browser 106. Voice web browser 106 is a computer program 
capable of accessing and processing voice web pages 103 in 
response to a request placed by subscriber 107. More 
specifically, voice web browser 106 (i) processes voice and 
touch tone activated subscriber commands, (ii) retrieves 
requested voice web pages 103 from the appropriate voice 
web site 102, (iii) interprets the embedded markup language 
(HVML) in the retrieved voice web page 103 and (iv) 
delivers the contents of a voice web page 103 to a subscriber 
107 over the telephone 111. In performing the above- 
mentioned processing, voice web browser 106 executes 
scripts, including "voice scripts" embedded in a voice web 
page 103. Voice web browser 106 provides a subscriber 107 
with fast, easy, convenient voice activated navigation and 
access to voice web pages 103. 

Voice web browser 106 is a conventional web browser 
modified with appropriate voice information playback and 
recording extensions and enhancements. Appendix A 
includes a specification of HVML and voice web browser 
commands and is incorporated herein by reference. 

Some voice web pages 103 contain references to scripts 
and programs that operate as service agents 110) to respond 
to subscriber requests as well as external events and carry 
out prescribed actions. These scripts and programs are 
externally stored on voice web sites 102 (for example as 
Common Gateway Interface (CGI) Scripts or Internet Ser- 
vices Application Programming Interface (ISAPI) 
programs). These external scripts and programs execute in 
the voice web server 102 environment as a service agent 
110. The external scripts and programs that comprise service 
agents 110 are referred to by URLs embedded in an asso- 
ciated voice web page 103. In the case of a voice web page 
103 that is a voice form, the script or program associated 
with the service agent executes in response to voice form 
submission by a subscriber 107. Service agents 110 follow 
standard Internet protocols such as HTTP, and conform to 
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conventional formats such as MIME and application pro- 
gramming interfaces (APIs) such as CGI and ISAPI. 

HVML Description 

Conventional web pages are designed primarily for pre- 
sentation on a computer color monitor and navigation by a 
mouse and key board. As such, graphics, images and text are 
the primary media types supported widely. Although, audio, 
video and 3-dimensional graphics extensions are becoming 
available, these extensions are directed primarily at com- 
puter users and not telephone users. 

Voice web pages 103 consist of HTML pages that have 
been extended with Hyper Voice Markup Language 
(HVML) for easy and effective navigation and access of 15 
voice information via a voice activated device such as an 
ordinary telephone. Voice web pages 103 retain all the 
properties and behavior of conventional HTML pages such 
as HTML markup tags, universal identifiers (URLs), and 
hyper-links and can be accessed by a conventional web 
browser using HTTP protocols from a conventional web 
server. The additional markup tags are interpreted by an 
HVML extended web browser to enable subscribers 107 to 
navigate and access voice web pages 103 over the phone or 
similar voice activated device. Appendix A includes a speci- 
fication of HVML and voice web browser commands and is 
incorporated herein by reference. 

HVML pages web pages voice web page 103 are specially 
designed for presentation using an ordinary telephone 111 
and navigation using touch tones and voice commands. This 
is in contrast to conventional multimedia web pages that 
may embed audio data to be presented on a multimedia 
personal computer using its speakers and navigated using its 
mouse, key board and microphone. Although, HVML voice 
web pages 103 can be embedded in generic multimedia web 35 
pages, thus sharing some of the information, they are 
designed to be presented using an ordinary phone and 
navigated using commands generated by touch tone signals 
and speech recognition. 

An HVML web page (voice web page 103) is first and 40 
foremost an HIML page. Each web page 103 has a unique 
universal resource locator (URL) (also called uniform 
resource locator). A URL is a string of characters that 
uniquely identifies an internet resource including an identi- 
fication of (i) the access protocol to be used; (ii) an indica- 
tion of resource type; and an identification of its location in 
the computer network. For example, the following fictitious 
URL identifies a www document: http://www.voiscorp.com/ 
banncr.gif uniquely identifies the location of a resource on 
the world wide web computer network, "http://" indicates 
the access protocol, "www.voiscorp.conr" is the domain 
name of the computer on which the resource is located, 
"banner" is the name of the resource located on the computer 
specified by the domain name, "gif" indicates that the banner 
resource is a gif (graphical interchange file) type resource. 
Similarly, the following fictitious URL uniquely identifies 
the location of a voice web page 103: http:// 
www.voiscorp.com/voicememo.hvml. In this example, 
"voicememo" is the name of the resource located on the 
computer specified by the domain name, "hvmi" indicates 
that the voicememo resource is an hvml type resource. Thus, 
web pages 103 are each uniquely identified by their corre- 
sponding URL. Once located, a web page 103 can be 
created, edited and played using existing web publication 
tools, it can be stored on any conventional web server 
anywhere on the Internet, it can be accessed by any con- 
ventional web browser and presented on a computer 
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monitor, it can be navigated using the computer's mouse, 
keyword, and (with some additional plug-ins) microphone, 
and it can contain embedded anchors and hyper links to 
other HTML pages, including other HVML pages. 

Voice web pages 103 are designed for three primary 
purposes: (1) presenting structured voice information to a 
user; (ii) enabling the user to navigate across and within 
voice pages; and (iii) capturing user input for information 
queries or submissioa 

a. HVML Presentation 

Presentation of voice information is accomplished prima- 
rily by the voice tag. The voice tag has a type attribute which 
specifies the type of voice information to be presented. If the 
type attribute has the file value, the voice information is 
obtained from a voice file specified by its URL. If the type 
attribute has the text value, the voice information is synthe- 
sized from the specified text. If the type attribute has 
number, ordinal, currency, date, or character value, then the 
voice information is generated by concatenating voice frag- 
ments from a pre-recorded indexed system voice file. If the 
type attribute has the stream value, then the voice informa- 
tion is obtained from the voice stream specified by its URL. 
Composition of several voice elements into a seamless voice 
string is accomplished by the voice-string tag. 

Combining these tags, publishers can compose and 
present: (i) pre-recorded voice prompts and messages; (ii) 
voice prompts generated using text-to-speech technology; 
and (iii) Pre -formatted voice prompts with dynamic speech 
synthesis elements. 

b. HVML Navigation 

Navigation of voice web pages 103 is primarily accom- 
plished by extending the HTML anchor tag with new 
attributes — tone and label. These attributes are used in 
conjunction with the existing href attribute in an anchor 
element that makes the anchor into a hyper link. When the 
user selects the touch tone signals specified by the value of 
the tone attribute or utters the word specified by the label 
attribute, the browser invokes the corresponding hyper link. 
The tone and label attribute values must be unique within a 
page. Navigation is also accomplished by system commands 
such as next, previous, reload, home, bookmarks, help, fax, 
and history which are invoked by specific touch tone 
sequences or utterance of the words. Users can control the 
voice browser operations by issuing system commands such 
as stop, start, play, pause, exit, backup, and forward. Using 
these attributes, publishers can enable (i) touch tone com- 
mand and control and link navigation; (ii) pre-defined, 
system and user specific, spoken command and control key 
word recognition; and (iii) page and user specific spoken 
command and control key word recognition. 

c. HVML Forms 

HVML uses the form tag to enable user input similar to 
HTML including the method attribute which specifies the 
way parameters are passed to the server and the action 
attribute which specifies the procedure to be invoked by the 
server to process the form. HVML extends the input tag 
within forms by introducing voice-input tag. Voice-input 
lakes a type attribute similar to the input tag with three new 
values "voice", "tone" and "review" in addition to the 
existing "reset" and "submit" values. The HVML browser 
pauses at each voice-input statement in a HVML form until 
the specified input is supplied or input is terminated, before 
processing the remaining form. Using these tags and 
attributes, publishers can enable: (i) touch tone command 
and control and parameter input; (ii) pre-defined, user 
specific, spoken alphabet and digit input; (iii) page and user 
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specific, spoken key word and proper names input; and (iv) 
free form voice information input. 

Operational Description of the Voice Web Browser 

Syntactic and structural intelligence, such as in-line pre- 5 
recorded voice prompts, pre-formatted voice prompts with 
dynamically generated voice elements, key word accessible 
anchor elements, voice responsive hyper links etc. are 
embedded in voice web pages 103 through voice access 
extensions to HTML. Behavioral intelligence including 30 
command interpretation, page access, file caching, HVML 
interpretation and user interaction is embedded voice web 
browser 106 (the HVML browser). Voice web browser 106 
has the following states: (i) waiting for user commands; (ii) 
active accessing and playing HVML pages; and (iii) paused 15 
for user input. 

Initially, voice web browser 106 is launched upon the 
system's receipt of a subscriber's telephone call. Once 
launched, voice web browser 106 goes through an initial- 
ization sequence that includes subscriber authentication and 
normally becomes "active" accessing and playing the sub- 
scriber's home page. Once the home page is played, voice 
web browser 106 "waits" for subscriber commands. As part 
of playing the page, the browser may "pause" for subscriber 
input and continue once the input is provided. 

Independent of any specific voice web page 103 that a 
subscriber may be accessing, voice web browser 106 pro- 
vides a set of navigational and operational commands. 
Within the telephone key pad, and "#" are special keys 
that generate unique tones, \toice web browser 106 has 
special meaning for these keys. In general, the "*" key 
followed by a sequence of touch tones, excluding the "#" 
key, signals a browser command, an escape or a skip and the 
"#" key signals a link activation, termination of form input, 
termination of a key sequence or a selection. 

Voice Web Services 

Voice web system 100 can be used to provide voice web 
services to a subscriber 107. A voice web service is a service 
that provides on-line telephone based access to information. 40 
The information is presented to the user through the publi- 
cation of voice web pages 103. The information presented to 
(published for) the subscriber may be information retrieved 
from a single information source or a combination of 
information sources including publicly accessible on-line 45 
databases, information proprietary to voice web system 100, 
information previously stored by subscriber 107 or another 
informaton source. Exemplary services provided by voice 
web system 100 include (i) personal information services 
such as calendar, address book, electronic mail, voice mail, 50 
(ii) information services such as headline news, weather 
reports, sports score, stock portfolio quotes, business white 
pages, yellow pages, classified information and (iii) trans- 
action services (commerce services) such as banking, bill 
payments, stock trading, airline hotel and restaurant reser- 55 
vations and catalog store orders. 

Users gain access to voice web services by becoming 
voice web subscribers 107. Subscribers 107 preferably sign 
up (e.g. register) for services through a service provider. In 
one embodiment, each subscriber 107 is assigned a unique 60 
account number on a calling card and subscribers 107 access 
the voice web system 100 by dialing a single "800" (e.g. toll 
free) service phone number and by then supplying their 
account number via the telephone 111. In an alternative 
embodiment, the services are publicly available and any user 65 
placing a call into the system is processed as a subscriber 
107 without requiring any registration. 
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FIG. 2A is a functional block diagram of a voice web 
system 200 configured to provide voice web services to a 
subscriber 107. Voice web system 200 includes one or more 
voice web gateways 105 coupled to one or more service sites 
202 via internet 101. Service site 200 is a voice web site 102 
configured to provide voice web services. Each voice web 
service is implemented using a collection of service agents 
201 and service pages 203 centered around a service data- 
base 202. Additionally, service site 200 optionally includes 
a personal profile 204 to be used to the extent that the service 
being provided requires pre-stored subscriber-specific infor- 
mation (i.e. pre-stored information personal to the particular 
subscriber). 

Voice web service agents 201 are a type of service agent 
110 (shown in FIG. 1) that execute dd service site 102 to 
provide voice web services to a subscriber 107. Vfciice web 
service agents 201 are therefore scripts and programs rep- 
resented by a web page 103 (show in FIG. 1). 

Service database 202 is a database of service information. 
The content of the service information varies with the type 
of service being provided. For example, if voice web system 
100 is configured to deliver a business white page service, 
then service database 202 is a database of address and phone 
number listings for businesses. If voice web system 100 is 
additionally or alternatively configured to deliver news 
headlines, then voice web system 100 includes a service 
database 202 that includes current news headlines. 

Service forms and pages 203 are voice web pages 103 that 
are HVML templates (voice forms and pages) that are "filled 
in" in response to a specific subscriber request. Service 
pages and forms 203 arc used to gather subscriber input, to 
retrieve information and to deliver (publish) information to 
a subscriber. Some service pages 203 are database entry and 
administration forms, some are database query forms and 
others arc database response pages. Entry forms are used to 
add information to the database. Query forms are used to 
extract information from the database. Response pages are 
used to present retrieved information to the user. In the 
prefered embodiment, service agents dynamically generate 
service and pages forms 203 by retrieving requested data 
from service database 202 and using the retrieved data in 
place of corresponding variables stored in an HVML tem- 
plate. The HVML templates link to each other specifying 
request-response dependencies. Thus, subscribers 107 are 
able to enter and retrieve information in personal and 
external databases over internet 101 using web protocols 
without having to create a voice web page for each entry in 
service database 202. 

Service agent 201 typically uses a service database 202 
and a set of service pages and forms 203 to provide the 
corresponding voice web service. The service database 202 
hosts the information that subscribers 107 wish to access. 
The service forms allow subscribers 107 to input and query 
information in service database 202. Service pages allow 
service agents 201 to present the requested information to 
the subscriber 107 using voice web browser 106. 

FIG. 2B is a functional block diagram of an exemplary 
calendar service. The calendar service agent 210 uses the 
calendar database 211 together with the calendar and 
appointment details input and query voice web forms 212 
and appointment list and details voice web pages 213. 
Subscribers fill in the calendar and appointment details input 
voice web forms 212 to set their calendar appointments and 
their details. The calendar service agent 210 processes the 
submitted form and updates the calendar service database 
211. Later, subscribers can retrieve their appointments for 
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any day by supplying 214 the month, date and year for that 
day in the calendar query voice web form 212. The calendar 
service agent 210 processes the submitted form, retrieves the 
matching appointments from the calendar database, and 
dynamically composes and returns the appointment list 
voice web page 213. If the subscriber requests for the details 
of any appointment, the calendar service agent 219 dynami- 
cally generates and supplies the corresponding appointment 
details page 213. 

The Personal Voice Web 

FIG. 3 shows a personal voice web 300 in accordance 
with the present invention. Personal voice web 300 is 
standardized collection of linked voice web pages and voice 

web forms (a special type of voice web page) that form a 15 prompting a subscribe^r month, day and year information, 

personal service space for the subscriber. Preferably, all ^fter receiving the prompted information, calendar and 

subscribers share a common structure of linked voice web appointments service agent generates the appropriate query 

pages although the contents of personal voice web pages ^ extract the requested calendar information from a calen- 

vary from subscriber to subscriber. Because each subscriber ^ar service database. Once the calendar information is 

of the personal voice web system 300 has the linked page 20 retr i eve£ j f rom me database, the calendar and appointments 

structure shown in FIG. 3, subscribers navigate about and service agent generates a voice web page that includes the 

access information from their personal voice web 300 in a retrieved information. The new page is then presented 

standardized way. Each page in personal voice web 300 (published) to the subscriber over the telephone by the voice 

includes an agent that performs various processing tasks weD browser, 

required for each respective page. At the root of personal 25 
voice web 300 is the personal home page 301. Personal 



Business white pages home page 314 is used to provide a 
white page service. The white page service enables a sub- 
scriber to enter partial company name, and optionally city 
name and state code to retrieve the company's full name, 
address and phone number. 

Each service page 309^314 is part of a collection of voice 
forms and pages that are used by the corresponding service 
agent to retrieve a request from the subscriber, generate an 
appropriate database query responsive to the subscriber- 
request, retrieve subscriber-requested information, and gen- 
erate a voice web page that incorporates the retrieved 
information and that is adapted for presentation 
(publication) to the subscriber using a voice web browser. 
Thus, for example the service agent associated with calendar 
and appointments page 309 generates a voice form for 



home page 301 links to a personal profile page 302, a 
personal administrative assistant page 303, a personal help- 
desk page 304, and a personal commerce page 305. 

The personal administrative assistant page 303 is linked to 
a number of personalized voice web services (service pages) 
330 including, by way of an example, a calendar and 
appointments page 309, an address book page 310, a stock 
portfolio page 311, a news headlines page 312, a mail box 
page 313, and a business white pages home page 314. 

Calendar and appointments page 309 is used to provide an 
appointments service. The appointments service enables a 
subscriber to track personal and business appointments in a 
voice-based calendar. The subscriber thus adds and retrieves 
appointments over the phone using personal voice web 300. 
In addition to providing day and time information related to 
stored appointments, a subscriber may also store voice note 
annotations that is associated with a particular appointment. 

Address book page 310 is used to provide an address 
service. The address service enables a subscriber to add and 
retrieve address, phone number, and other information 
related to individual names or company names. The infor- 
mation added and retrieved, is stored in a address book 
service database private to the subscriber. 

Stock portfolio page 311 is used to provide a stock quote 
service. The stock service enables a subscriber to retrieve 
current stock pricing and portfolio valuation information as 
well as statistical information related to changes in portfolio 
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Each of the other personal service agents associated with 
personal service pages 308-327 operate in a similar way to 
provide a subscriber with information retrieved from asso- 
ciated service databases. 

Personal helpdesk page 304 is linked to personal voice 
web helpdesk service pages 331 including, by way of 
example, a hotels page 315, an airlines page 316, a rental 
cars page 317, a travel agents page 318, a restaurants page 
319, a financial services page 320, and a banks page 321. 
The personal helpdesk page has an associated personal 
helpdesk agent that is used to provide a set of helpdesk 
services. Helpdesk services enable a subscriber to access 
product, pricing, availability and other information of the 
corresponding services. 

Hotels page 315 is used to provide a hotel reservation 
service. Airlines page 316 is used to provide an airline 
booking service. Rental cars page 317 is used to provide a 
rental car reservation service. Travel agents page 318 is used 
to provide a travel service. Restaurants page 319 is used to 
provide a menu and reservations service. Financial services 
page 320 is used to provide a financial service. Bank page 
321 is used to provide a bank service- 
Personal commerce page 305 is linked to personal voice 
web commerce service pages 332 including, by way of 
50 example, an apparel shops page 322, a luggage stores page 
323, a gift shops page 324, a flower shops page 325, an office 
supplies stores page 326, and a book stores page 327. The 
personal commerce page provides commerce services that 
enables a subscriber to access catalogs associated with 



45 



or stock positions. The stock service uses information 5S various retail establishments. As part of the commerce 



retrieved from a stock portfolio service database private to 
the subscriber and additionally retrieves current stock pric- 
ing information from an on-line data-base or information 
source. 

News headlines page 312 is usedenables ide a news 
service. The news service enables a subscriber to retrieve 
news headlines related to subscriber customized topics. 

Mail box page 313 is used to provide a mailbox service. 
The mailbox service enables a subscriber to access elec- 



service, the personal voice web allows a subscriber to shop 
in various catalogs and then submit orders for selected items 
directly to the sponsor of the associated catalog. Orders are 
submitted to the catalog sponsor cither as a voice web form 
60 or conventional web form sent to the sponsor, as an elec- 
tronic message or using another means. 

Personal profile page 302 links to a set of personalized 
voice web profile pages including an authentication page 
306, a speech profile page 307, and an attributes and 



Ironic mail (e-mail) messages. The e-mail messages are 65 preferences page 308. 
played for the subscriber using text to speech conversion and User authentication page 306 contains authenticating 
a speech synthesizer. information including a subscriber account number, an 
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encrypted password or personal identification number and 
links to a voice authentication signature MIME resource. 

Speech profile page 307 is linked to a hierarchy of speech 
training pages that correspond to the hierarchy of personal 
voice web 300. FIG. 4 shows the hierarchy 400 of speech 
training pages 401-427. Speech training pages 401-427 are 
sets of prc-capturcd training files to be used in performing 
speaker dependent speech recognition in providing the cor- 
responding service to a subscriber. Each speech training 
page is thus accessed by the corresponding agent in per- 
forming the corresponding service. For example, the admin- 
istrative assistant service accesses administrative speech 
training set 431 (including speech training pages 409-414). 
The helpdesk service accesses the helpdesk training page set 
432 (including speech training pages 415-421). The com- 15 
merce service accesses the commerce training page set 433 
(including speech training pages 422—427). 

Each speech training page 401-427 includes training data 
specifically tailored to the words more commonly associated 
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The collection of profile pages for a single user constitute 
that user's personal voice web profile 300. Personal Voice 
web profile 300 need not be a collection of static HVML 
pages (voice web pages), but instead be generated dynami- 
cally using user profile page databases. However, once 
generated, these profile pages can be reused from various 
cache systems within the voice web system without having 
to retrieve them from their original databases thus saving 
significant time and resources. 

In operation, a personal voice web service agent uses a 
corresponding service profile agent to retrieve subscriber 
and service specific attributes and preferences, speech train- 
ing profiles and other information from the corresponding 
service profile database. The personal voice web service 
agent uses the retrieved subscriber and service specific 
information in personalizing the voice web service forms 
and pages as well as in enhancing and improving speech 
recognition by embedding the speech training profiles in the 
corresponding voice web forms and pages. 

Referring back to FIG. 2B, for example, the calendar 
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with the corresponding service. For example, the calendar service agen , 21Q ^ a calendar service 



speech training page 409 includes training vocabulary to aid 
in the recognition of voice commands such as "Tenth", 
"November", "Tuesday" and so forth. 



profile agent 215 to retrieve subscriber specific calendar 
attributes and preferences included in profile database 216 
by specifying the subscriber's calendar attributes and pref- 



Referring now again to FIG. 3, personal attributes and M erences profile URL as part of a profile request web form, 

preferences page 308 includes subscriber attribute informa- Calendar service profile agent 215 responds to the submitted 

tion including name, account number, address, voice tele- weo form, retrieves the requested subscriber information 

phone number, fax telephone number, paging telephone from the calendar service profile database 216 and delivers 

number, encrypted credit card numbers and the like as well n t 0 calendar service agent 210 as a table formatted web 

as personal preference information such as configuration, 3Q p a g e . Calendar service agent 210 retrieves the requested 

selection and presentation preferences. Personal attributes information from the table format in the web page and uses 

and preferences page 308 is also linked to hierarchy of me subscriber's attributes and preferences to customize the 

attribute and preferences pages (shown in FIG. 5) that voice web service form and page templates 213 before 

correspond to the hierarchy of personal voice web 300. presenting them to the subscriber. In this way, the subscriber 

FIG. 5 shows the hierarchy of attributes and preferences 35 can have a personalized form or page presented to him/her 

pages 501-527 associated with personal attributes and pref- without having to supply information about himself/herself 

erences page 308. Attributes and preferences pages 501-527 repeatedly in each call. 

are pages that store subscriber-specific preference informa- Similarly, calendar service agent 210 uses a correspond- 

tion to be used in providing the corresponding service to a ing calendar service profile agent 215 to retrieve subscriber 

subscriber. Each attributes and preferences pages 501-527 is 40 specific calendar speech training profiles from profile data- 



thus accessed by the corresponding agent in performing the 
corresponding service. For example, the administrative 
assistant service accesses attributes and preferences set 531 
(including attributes and preferences pages 509-514). The 
helpdesk service accesses the helpdesk attributes and pref- 
erences set 532 (including attributes and preferences pages 
514-521). The commerce service accesses the commerce 
training page set 543 (including attributes and preferences 
pages 522-527). 



base 216 by specifying the subscriber's calendar speech 
training profile URL as part of a profile request web form. 
Calendar service profile agent 215 responds to the submitted 
web form retrieves the requested subscriber information 
45 from the calendar service profile database 216 and delivers 
it to the calendar service agent 210 as a table formatted web 
page. The calendar service agent 210 retrieves the requested 
information from the table format in the web page and 
embeds the subscriber's speech training profiles in the voice 



It should be noted that the user profile information for 50 web form and page templates (pages 212,213) before deliv- 



multiple subscribers is stored in user profile databases. The 
user profile databases are accessed by service dependent 
profile agents. For example, personal identification and 
verification information of multiple subscribers is stored in 
a user profile home page database (a service database) and 55 
accessed by the subscriber's profile home page agent. Cal- 
endar attributes and preferences information for multiple 
subscribers is stored in the subscriber calendar attributes and 
preferences profile database (a service database). Calendar 
service specific speech training information for multiple 60 
subscribers is stored in the subscriber calendar speech 
training profile database (a service database). Calendar ser- 
vice profile agent responds to HTTP form requests for 
calendar attributes and preferences or calendar speech train- 



ering them to the voice web browser. The voice web browser 
uses these speech training profiles to dynamically change the 
active vocabulary in the voice processing software and 
hardware thereby customizing it to the subscriber. 

FIG. 2C is a functional block diagram of an alternative 
configuration of a voice web system in accordance with the 
present invention. The system includes a computer config- 
ures as a combined voice gateway and voice web site 
(combined site) 220. Combined site 220 includes gateway 
components such as a voice and telephony interface 114, a 
voice web browser 106 and server software 112. Combined 
site 220 additionally includes voice web site components 
such as service agents 201, service database 202 and service 
forms and pages 203. Combined web site 220 provides voice 



ing profile page information for any particular subscriber 65 web access to a subscriber 107 coupling the combined site 
and supplies the appropriate subscriber profile page infor- 220 via the PSTN 109. Because the voice gateway and voice 
malion as HVML voice web pages. web site functions are combined within a single computer 
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environment, the server software 112 (located in combined 
site 220) and the voice web browser 106 exchange files 
without suffering the delays imposed by routing across the 
Internet 101. In certain applications, for example when a 
subscriber is accessing personal databases this configuration 
is advantageous to improve system performance. It should 
be noted, however, that even though server software 112 
(located on combined site 220) and voice web browser 106 
exchange files using a local interface as opposed to Internet 
101, they nonetheless exchange files in accordance with 
HTTP. 

Voice web browser 106 communicates with other web 
sites (such as web sites 224 and 225) using Internet 101. 
Web site 224 is a computer coupled to Internet 101 config- 
ured with server software 112, service agents 201, service 
database 202 and service forms and pages 203. Web site 224 
is configured to deliver voice web services as described in 
reference to FIGS. 2A and 2B. 
Web site 225 is a computer configured with server soft- 



directory. The login agent additionally verifies the PIN 
which was submitted. Upon verification of the PIN, the login 
agent presents 603 the subscriber's voice authentication 
form to the subscriber over the telephone. As part of the 
5 presentation, the login agent requests the subscriber to 
supply a personalized voice authentication sample. The 
login agent then waits 604 for the subscriber to supply the 
sample and submit 605 the form. After the subscriber 
submits 604 the form, the login agent processes 606 the 
to submitted form. During processing 606 of the submitted 
form, the login agent accesses the subscriber's personal 
authentication page from the subscriber's personal voice 
web profile (linked to the subscriber's home page) and 
attempts to retrieve the voice authentication signature. If this 
IS is the first time the subscriber is accessing the service, the 
signature will be missing from the subscriber's authentica- 
tion page. In this case, the login agent presents 607 the 
authentication signature creation form to the subscriber. 
Using the options presented in the signature creation form, 



r* ■ - , ■ t 20 the subscriber selects the option to create or modify the 

ware 112, a profile service agent 223, service forms and . . , . ,r . _ „ . J . 



pages 222 and profile database 221. Web site 225 is a 
universally accessible profile web site that is accessed by 
any other web site or web gateway in the voice web system 
as long as the accessing web site or web gateway has the 
appropriate URL information. Web site 225 provides user 
profile information to web site agents (such as service agents 
201) located on other web sites (such as web site 224 and 
combined site 220). Advantageously, any web site and/or 
web gateway can thus access information stored in the 
profiles database 216 by hyperlinking to the web page 
associated with profile service agent 215. 

User Authentication and verification 

Personal voice web system 300 uses a login agent as a 
gatekeeper to the access of each subscriber's personal voice 
web. The login agent is a distributed software program that 
can receive subscriber information over a telephone, access 
the subscriber's personal profile pages from the subscriber's 
personal voice web and verify the subscriber's credentials 
over the telephone. 

Each system subscriber is given (i) an account number (ii) 
a personal identification number (PIN) and (iii) a service 
calling number. In order to access a personal voice web, the 
subscriber calls the service calling number and uses account 
information and the PIN to initiate a subscriber authentica- 
tion process. FIG. 6 is a flow diagram of a subscriber 
authentication method 600 in accordance with the present 
invention. The subscriber authentication method 600 
includes authentication signature creation form processing 
and subscriber authentication processing. 

A subscriber initiates access 601 of his or her personal 
voice web 300 by calling the service calling number using 
a conventional telephone or a similar voice activated device 
computer configured to access the public telephone network. 
After the subscriber initiates access 601, a login agent starts 
login processing 602. 

During login processing 602, the login agent answers the 
call and presents a standard login form to the subscriber. A 
login form is a voice form for collecting and submitting 
login information including subscriber account number and 
the subscriber PIN. After a subscriber enters the login 
information (into the login form) and submits the login form, 
the login agent uses the login information to retrieve the 
URL of the subscriber's personal voice web home page 301. 
The login agent retrieves the URL by looking up the 
subscriber's account number in the voice web subscriber 
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personal voice authentication signature. Following the 
instructions provided by the login agent, the subscriber fills 
in 608 the voice authentication signature creation form and 
records a personalized voice phrase as an authentication 
signature. After filling in 608 the signature creation form, the 
subscriber submits the form to the login agent. The login 
agent waits until the signature creation form is submitted 
609. The login agent then processes 610 the recorded phrase 
converting it into a signature pattern and linking it to the user 
authentication page as a MIME resource for future verifi- 
cation. 

If however, after processing 606, the login agent deter- 
mines that there is an authentication signature stored in the 
subscriber's personal profile then the login agent perform a 
test 611 to determine whether there is a match between the 
stored authentication signature and the voice sample sub- 
mitted by the subscriber. If test 611 determines that there is 
a match between the sample and the signature, then the 
subscriber is given access to the personal voice web and the 
voice web. Test 611 uses conventional voice authentication 
methods. A "match" is determined by test 611 when the 
conventional voice authentication method determines that 
the speaker's voice print or voice signature matches a master 
stored voice print or voice signature within a specified 
tolerance. If, however, the test determines that there is not a 
match between the sample and the signature, then the 
subscriber is denied access 613. 

Enhanced Speech Recognition 

Automatic speech recognition falls into three categories: 
speaker dependent, speaker adaptive, and speaker indepen- 
dent. A speaker dependent system is developed to work for 
a single speaker and are usually easier to develop, cheaper 
to buy and more accurate but requires the use of user- 
specific speech training files. 

The size of the vocabulary of a speech recognition system 
affects the complexity, processing requirements and the 
accuracy of the system. Referring now again to FIG. 3, 
personal voice web 300 uses small to medium sized vocabu- 
laries (ten to hundred of words). 

An isolated-word or discrete speech system operates on 
single words at a time requiring a pause between each word 
utterance. This conventional type of speech recognition is a 
simple form of recognition to perform because the end 
points are easier to find and the pronunciation of a word 
tends not to affect others. As the occurrences of the words 
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are more consistent and sharply delimited they are easier to 
recognize. Personal voice web 300 focuses on discrete 
speech and in particular on speech used for command and 
control. 

Personal voice web 300 typically uses speech coded at 8 
kHz using 8 bit samples resulting in 64 kbps bandwidth and 
storage. Conventional adaptive pulse code modulation 
(ADPCM) techniques can reduce the bandwidth to 16 kbps 
without loss of information. 

Personal voice web 300 uses conventional speaker depen- 
dent recognition of discrete speech. This conventional 
speaker dependent recognition relies on digital sampling of 
the word utterances. After sampling, the next stage is 
acoustic signal processing. Most techniques include spectral 
analysis. This is followed by recognition of phonemes, 
groups of phonemes and words. This stage uses many 
conventional processes such as Dynamic Time Warping, 
Hidden Markov Modeling, Neural Networks, expert systems 
and combination of techniques. Hidden Markov Modeling 
based techniques are commonly used and generally the most 
successful approach. Additionally, personal voice web 300 
uses some knowledge of the language to aid the recognition 
process. 

Personal voice web 300 improves speaker dependent 
recognition of discrete speech in a command and control 
context using universally accessible personal speech train- 
ing profiles 401-427. As described above, the personal 
speech training pages 401-427 are organized as a linked 
collection of voice web profile pages each linked to the 
corresponding personal voice web service page. Thus, the 
personal speech training profile pages parallel the personal 
voice web service pages in structure as shown in FIGS. 3 and 
5. Each speech training page 401-427 contains the training 
vocabulary for browser command and control that is context 
dependent. 

Each service page 301-327 linked to the personal voice 
web home page 401 has a corresponding speech training 
page 402-427. The personal voice web 300 is constructed in 
such a way that each voice web service page 302-327 links 
to its corresponding speech training page 401-427 using its 
URL. As the subscriber navigates from service page to 
service page in the personal voice web 300, the system is 
able to access the corresponding speech training page using 
its embedded URL. 

Each speech training page 401-427 contains a set of 
command and control key words and their personalized 
speech recognition patterns representing the context sensi- 
tive vocabulary for the corresponding service page. For 
example, the calendar and appointments service page 309 is 
linked to a corresponding speech training page 409 contain- 
ing key words and recognition patterns for "year", "month", 
"day", the names of the months and days, digits representing 
dates and times etc. Similarly, slock portfolio page 311 is 
linked to a corresponding speech training page 411 contain- 
ing key words and recognition patterns for "stock", "quote", 
"volume", "option", "symbol", names of companies in the 
portfolio etc. 

FIG. 7 is a flow diagram of a speech recognition process 
700 in accordance with the present invention. The process is 
initiated after a subscriber has gained access 701 to the 
personal voice web in accordance with the process described 
in reference to FIG. 6. Once the subscriber gains access to 
the personal voice web 701, the login agent accesses the 
subscriber's personal voice web home page and presents 702 
the home page to the subscriber over the phone. During the 
process of presenting 702 the home page, the login agent 
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loads the personal voice web profile page 302 and the speech 
profile page 501 containing the command and control 
vocabulary for the home page. Th is vocabulary includes the 
basic voice web browser command and control as well as 
home page specific command and control. From the home 
page, the subscriber requests a particular service (i.e. per- 
sonal administrative assistant, the personal helpdesk or the 
personal catalog store). The home page agent determines 
703 what service the subscriber has selected and in response, 
invokes 704 the selected service and then proceeds to deliver 
705 the service. During invocation 704 of the service, both 
the service page and the speech training page associated 
with the service page are loaded on the voice web gateway 
where the voice web browser uses them to deliver the 
service and improve speech recognition. 

During delivery 705 of the selected service, the service 
agent uses the speech training page associated with the 
selected service to recognize voice commands submitted 
720 by the subscriber. Specifically, the service agent obtains 
the speech training profile, embeds it in the service page as 
a MIME resource and forwards it to the voice web browser 
which uses the training profiles to improve recognition. 
Thus, responding to the subscriber's voice commands per- 
tinent to the accessed voice web service page, the voice web 
browser recognizes the command and control word utter- 
ances (the subscriber's voice commands that are submitted 
720) and matches them against the personalized vocabulary 
in the corresponding voice web speech training page for 
accurate speaker dependent recognition of discrete speech. 

If the subscriber requests access to a new service page 
linked to a currently accessible service page, the currently 
active service agent exits 706 the current service and then 
invokes 704 the requested service. During the invocation of 
the requested service, the requested voice web service page 
corresponding to the requested service is loaded as well as 
the corresponding speech training page containing the 
matching command and control vocabulary. In this process 
700, the active service agent always uses the most appro- 
priate vocabulary for the existing context thereby greatly 
reducing the size of the active vocabulary that needs be 
accessed while significantly improving the speaker depen- 
dent recognition. 

Query localization and customization 

Query customization uses stored subscriber attributes and 
preferences to customize queries of service databases. Query 
customization is accomplished by maintaining user 
attributes and preferences in a collection of voice web pages 
501-527 (described above in reference to FIG. 5) that 
parallel the corresponding voice web service pages 301-327 
(described above in reference to FIG. 6) and using the 
attribute and preferences information corresponding to the 
service requested to customize the query parameters within 
forms. 

Referring now again to FIG. 5, the attributes and prefer- 
ences pages 501-527 parallel the personal voice web service 
pages 301-327 in structure as shown in FIG. 3. Each service 
page linked to the personal voice web home page 301 has a 
corresponding voice web attributes and preferences page 
linked to iL The personal voice web 300 is constructed in 
such a way that each voice web service page 301-327 links 
to its corresponding voice web attributes and preferences 
page 501-527 using its URL. As the subscriber navigates 
from service page to service page in the personal voice web 
300, the system is able to access the corresponding voice 
web attributes and preferences page using its embedded 
URL. 
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A subscriber of voice web services requests information 
by accessing a voice web service page and having it played 
by the corresponding agent (i.e. administrative assistant, 
hclpdesk or commerce agent). The subscriber requests ser- 
vice through submitting a query form presented by the 
corresponding agent. The query form is an HVML form for 
touch tone and voice data input. When a service is requested 
by the subscriber, the agent retrieves the corresponding 
voice web attributes and preferences page and automatically 
fills the query form with appropriate default parameters 
obtained from the subscriber's attributes and preferences. 
For example if the subscriber is accessing the weather 
service page, the agent fills in the subscriber's home town 
and other chosen cities automatically from the subscriber's 
attributes and preferences page. Similarly, if the subscriber 
is accessing the stock portfolio service page, the agent 
accesses the corresponding attributes and preferences page 
and fills in the subscriber's chosen portfolio of stocks in the 
query form. In addition, the agent also automatically fills in 
the appropriate subscriber attributes such as his/her access 
account number, password etc., thereby casing the subscrib- 
er's access while exploiting the availability services through 
web based queries. 

FIG. 8 is a flow diagram of a query customization process 
800 in accordance with the present invention. The process is 
initiated after a subscriber has gained access 801 to the 
personal voice web in accordance with the process described 
in reference lo FIG. 6. Once the subscriber gains access 801 
to the personal voice web, the login agent accesses the 
subscriber's personal voice web home page and presents 802 
the home page to the subscriber over the phone. 

During the process of presenting 802 the home page, the 
login agent loads the attributes and preferences page 501 
from the subscriber's voice web personal profile. Attributes 
and preferences page 501 contains preferences for the home 
page 301. From the home page 301, the subscriber accesses 
the targeted voice web service page by navigating the 
appropriate hyper links from the voice web home page 301. 
In response, the selected service is invoked 803 and the 
selected service then proceeds to deliver 804 the service. 
During invocation 803 of the selected service, both the 
service page and the attributes and preferences page asso- 
ciated with the service page are extracted by the service 
agent. 

During delivery 804 of the selected service, the service 
agent uses the attributes and preferences page associated 
with the selected service to customize queries of the asso- 
ciated service database. More specifically, using the 
attributes and preferences information, the service agent 
automatically fills in the needed fields in the corresponding 
query form with user specified defaults and preferences. 
Having filled the appropriate fields, the service agent plays 
the remaining query form to the subscriber thereby greatly 
reducing the information that the subscriber has to supply on 
the telephone. The service agent then obtains the remaining 
information, if any, from the subscriber and submits the 
query form to the service database. When the results arc 
returned (i.e. the information is retrieved from the service 
database), the service agent plays the results to the sub- 
scriber over the telephone. 

Form Based Voice Web Page Publishing 

In another aspect of the invention, voice web system 100 
enables publishers to compose voice web forms and pages 
statically using ordinary word processing programs and link 
them to voice files created using ordinary audio capture and 
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editing tools available on personal computers and worksta- 
tions. Alternatively, voice web agents can dynamically com- 
pose voice web pages and forms based on user requests and 
optionally profiles as well as accessed databases and ser- 
5 vices. Advantageously, dynamic form-based publication 
enables information and service providers to publish voice 
web pages using the conventional telephone without the 
need for any additional computer based voice web publish- 
ing tools. Dynamic form-based publication is achieved by 
io combining voice web publishing forms, voice web publish- 
ing agents and voice web page publishing templates. 

FIG. 9 is a flow diagram of a voice publishing method in 
accordance with the present invention. The method presents 
901 a voice web form to a caller calling into a voice web 
system using a conventional telephone, Voice web publish- 
ing forms are specially designed voice web forms that when 
interpreted (i.e. when played back) using the voice browser 
prompt the caller (the voice information publishers) to input 
voice and touch tone based input using a telephone. The 
forms guide the caller step by step to supply the needed 
information, edit and modify the information and finally 
submit 903 the information for processing 902. 

Voice web publishing agents process 902 the filled voice 
web publishing forms extracting and separating voice infor- 
mation and touch tone input. Based on the touch tone inputs, 
the agents may present additional publishing forms to the 
caller (publisher). The voice information is stored 904 in 
voice files and linked to the corresponding voice web page 
publishing template by substituting variables within the 
3° page template with the generated files. The touch tone input 
is used whenever the caller (publisher) needs to input 
alphanumeric information that can be processed by the 
publishing agent. 

Voice Web White, Yellow and Order Pages 

Without limiting the general applicability of form based 
voice web page publishing, a specific application of the 
process of form-based publishing is next described. The 
exemplary form based publishing process relates lo the 
publication of voice web business white pages, yellow pages 
and order entry pages. FIG. 10 shows a white-yellow-order 
page system 1000 in accordance with the present invention. 
Voice web business white pages 1001 are voice web pages 
that arc dynamically composed by the voice web business 
white pages agent 1003 from a business white page database 
1002 information including the name, address, phone num- 
ber of businesses. The white pages agent 1003 presents a 
search form to a caller for specifying the name of the 
business and allows further narrowing of the search by city 
and state. Each business white page can be linked to a 
corresponding business yellow page 1004. Business yellow 
pages 1004 contain additional information about the busi- 
ness including a lag line, advertisement, directions, working 
hours, and promotions. In addition, each yellow page 1004 
can be linked to a corresponding business order entry form 
1005. Business order entry forms 1005 allow users to order 
products and services or transact business by specifying 
product or service codes, preferences, quantity, and credit 
card numbers for payment. 

A participating business can publish a voice web yellow 
page 1004 by simply filing a corresponding voice web 
yellow page publishing form 1007. A yellow page publish- 
ing agent 1006 processes the yellow page publishing form 
1007 and dynamically generates a business yellow page 
1004 for that business from a standard yellow page template 
by replacing variables in the template with values supplied 
by the submitted yellow page publishing form. 
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The yellow page publishing agent 1006 (a publishing 
agent) presents a yellow page voice web publishing form 
1007 to the participating business. Voice web publishing 
forms are specially designed voice web forms that wheo 
interpreted (i.e. when played back) using the voice browser 
prompt the caller (the voice information publishers) to input 
voice and touch tone based input using a telephone. Yellow 
page publishing form 1007 guides the caller step by step to 
supply the needed information, edit and modify the infor- 
mation and finally submit the information for processing, as 
described in reference to FIG. 9. Specifically, yellow page 
publishing form 1007 prompts for voice information includ- 
ing name, tag line, advertisement, directions, working hours 
and promotions. In addition, the yellow page publishing 
agent 1006 prompts for touch tone input including the 
account number, password, phone number, yellow page 
category code and credit card number. Yellow page publish- 
ing agent 1006 uses the account number to identify the 
business, the password to verify the business, the phone 
number to link it to the corresponding white page, the yellow 
page category code to classify the business within business 
yellow pages, and the credit card number to pay for the 
business yellow page. Once the business is identified and 
verified, yellow page publishing agent 1006 dynamically 
creates a business yellow page 1004 from a standard tem- 
plate for the appropriate category. Yellow page publishing 
agent 1006 uses the supplied business phone number to 
match with the appropriate database entry in the business 
white pages and updates it with the URL of the newly 
created yellow page to link it. 

A very similar process occurs for publishing order entry 
forms. A business order entry form publishing agent, order 



page publishing agent 1008 presents an appropriate order 
entry publishing form 1009 to a participating business. 
Order page publishing agent 1008 requests for appropriate 
customized prompts for specific fields in the business order 

5 entry form such as product or service code, customer 
preferences, quantity, credit card number etc. Order page 
publishing agent 1008 also requests for touch lone input for 
the account number, password, phone number, and credit 
card number. Order page publishing agent 1008 uses the 

to account number and password for identification and 
verification, the phone number to link it to the corresponding 
yellow page 1004 and the credit card number for payment 
for the order entry form. Once the business is identified and 
verified, order page publishing agent 1008 dynamically 

is generates an order entry form for that business by filling the 
supplied information into a standard order entry template for 
that business category. Order page publishing agent 1008 
uses the supplied business phone number to match with the 
appropriate database entry in the business white pages, 

2Q updates it with the URL of the newly created order entry 
page, locates the corresponding yellow page using its URL 
in the database, and updates it to link to the newly created 
order entry page. 

The foregoing discussion discloses and describes merely 
25 exemplary embodiments of the present invention. As will be 
understood by those familiar with the art, the invention may 
be embodied in other specific forms without departing from 
the spirit or essential characteristics thereof. Accordingly, 
the disclosure of the present invention is intended to be 
illustrative, but not limiting, of the scope of the invention, 
which is set forth in the following claims. 
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I. HVML Specification 



Hyper \bice Markup Language consists of a set of extensions to existing HTML. Some 
of the extensions are new elements with new tags and attributes. Others are extensions to 
existing elements in the form of new attributes. All atlribuie values are shown as %vatuc 
type*.. 

In-line Voice components 

The primary mechanism for introducing voice prompts into en HTML page is a new 
inline voice HVML element similar to the inline image HTML element. The tag for this 
clement is "VOICE" and it has many variations. Each variation is specified by value of 
the TYPE attribute. Depending on the type, each variation has additional attributes. 
\bice Files 

<VOICE TYPE- "File" SRC- "%VRL%" TEXT- "%UJtt%"> 

VOICE tag with TYPE set to "File" indicates a file containing pre-recorded voice 

information. It's attributes are SRC and TEXT. SRC attribute specifies the URL for the 

voice file and TEXT attribute, which is optional, specifies the text that can be translated 

to speech as an alternative to the voice file. 

\bice Index Piles 

<VOICE TYPE- "Index" SRC- "%URL%" INDEX- "%index$6" TEXT- "%text?e"» 
VOICE tag with TYPE set to "Index" indicates an indexed file containing pre-recorded 
voice phrases. It's attributes are SRC, INDEX and TEXT. SRC and TEXT have same 
meaning as in Vbice Files. The INDEX attribute specifies index of the phrase within the 
file either as a number or a label. 
For example: 

<VOICE TYPE- "File" SRC-"myweb/honWg;ecUng.wBv"> 
Tcxt-to-Spcech 

<VOICE TYPE- Text" TEXT- "%text%"> 

VOICE tag with TYPE set to "Text" indicates a text-to-speech string. It's attribute is 
TEXT which specifics the string that needs to be translated to speech. 
Fot example: 

<VOICE TYPE- Text" TEXT-"Wclcome to your Home Page"> 
Voice Streams: 

<VOICE TYPE- "Stream" VALUE- "%URL%" TERMINATE- "%lone%"> 

VOICE tag with TYPE set to "Stream" indicates a continuous voice stream identified by 

its URL The browser accesses the voice stream and continuously plays it to the user. It's 
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APPENDIX A-continucd 

attribute is TERMINATE which specifies the tone the user can entci to terminate the 

playback. 

Currency 

<VOICE TYPE- "Money" VALUE- "56 number*" FORMAT- "%form»t%"> 
VOICE tag with TYPE set to "Money" indicates a number that needs to represented as 
currency. It's attributes are VALUE and FORMAT. VALUE specifies the decimal value 
of the number and FORMAT, which is optional, specifies the currency type such as "US 
Dollar", "British Pound" etc. The default value for FORMAT is "US Dollar". 
Numbers 

<VOICE TYPE- "Number" VALUE- "%number%" FORMAT- "%format%"> 

VOICE tag with TYPE set to "Number" indicates a number that needs to be presented as 

a decimal number. It's attributes are VALUE and FORMAT, VALUE specifies the 

decimal value and FORMAT, which u optional, specifies the precision to be conveyed. 

Digits after the decimal point are pronounced as characters. Default value for the 

FORMAT is 2 which indicates 2 digit precision after decimal point. 

Characters 

< VOICE TYPE- "Character" VALUE- "%string3fc> 

VOICE tag with TYPE set to "Character" indicates a sequence of characters that are to be 
presented separately with no pauses in between. It's attribute is VALUE which specifies 
the sequence of characters as string. 
Dates 

<VOICE TYPE- "Date" VALUE- "%date%" FORMAT- "%form»t%"> 

VOICE tag with TYPE set to "Dote" indicates an expression that is to be presented as a 

date. It's attributes are VALUE and FORMAT. VALUE attribute specifies the expression 

and the FORMAT attribute, which is optional, specifies the format of the expression. 

Default format is MM/DD/YY. 

Ordinals 

<VOICE TYPE- "Ordinal" VALUE- "%number%"> 

VOICE tag with TYPE set to "Ordinal" indicates a number that is to be presented as an 
ordinal (Le. as Nth value). It's attribute is VALUE which specifies the number, \alues 
ore pronounced as "first", "second", "third" etc. 
Strings: 

<VOICESTRING NAME- "5fcname%"> 
. . . Voice Components . . . 
</VOICESTRING> 

VOICESTRING lag indicates a sequence of voice components thai are grouped together 
for presentation without any pauses in between. Each of the voice components can be 
any of the primitives previously defined. The voice browser gathers the individual 
components and plays them together in sequence, 
<Voicestring NAME- "welcome":* 

< Voice TYPE- "Index" SRC- "welcomcvap" INDEX- "begin" TEXT- "Welcomed 

<\foice TYPE- "File" SRC- "usemame.vox" TEXT- "user's name"> 

< Voice TYPE- "Index" SRC- "welcomcvap" INDEX- "end" TEXT- "to VOIS NET' 

</VbiceSlring> 

The voice browser "plays" each in-line voice component in sequence as it encounters it in 
the HVML page starting from the beginning of the page. Each voice component is played 
only once for each presentation. A "reload" command would cause the voice browser to 
re-play the page. 

Of course, voice elements can also be invoked by hypei links pointing to voice files 
containing digitized voice data. This is similar to existing HTML conventions. The voice 
browser simply fetches the new page and plays it once. In the next section, we will 
discuss how hyperlinks can be invoked using touch tone or key word input. 
\faice responsive labels for hyper-links 

In order to invoke hypei links embedded in a HVML page, two new altributes TONE" 
and "LABEL" are added to ihe anchor element. These attributes are used in conjunction 
with the existing HREF attribute in an anchor element that makes the anchor into a hyper 
link. When the user selects the touch tone signals specified by the value of the TONE 
attribute followed by the tone or utters the word specified by the LAB BL attribute, 
the browser invokes the corresponding hyper link. The TONE and LABEL attribute 
values must be unique within a page. 
For example: 

<A HREF-"myweb/home/greeting.vml TONE-" HELLO" > 
or 

<A HREF-"myweb/home/grceting.vrnl LABEL-"HELLO"> 

When the user presses "H.E.L.L.O^" on the touch tone phone or the user says the 

word "HELLO" on the phone, the browser will invoke the corresponding hyper link and 

accesses the "greeting.vml" page. 

Keyword accessible indexes for anchors 

HTML allows the index access of fragments within a page by unique labels associated 
with anchors surrounding the fragment. The NAME attribute in an anchor element 
specifies a label that is unique within the page. This label can then be used as an index by 
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tit browser to search for the fragment by matching the unique label with the erne supplied 

in the hyperlink. The hyperlink far the indexed fragment uses the tegular URL for the 

page concatenated with the fragment's unique label with a separator. 

Coupled with voice responsive hyper link*, fragment labels can be used to construct 

simple menus oj database searches. 

For example: 

Suppose " myweb/bome/p ro mp ts. vm 1" contains the following HVML text. 

<A NAME»"prompU"> 

<VOICE TEXT-Tress CALfi for Calendar*^ 

<JA> 

<A NAME-"promp(2"> 

<VOICE TEXT-'Press ADDR# for Address Book"> 
</A> 

<A NAME-~prompt3"> 

<VOICETEXT--Prcss EMAIL for Electronic Mail"> 
</A> 

Suppose another HVML page contains the following hyperlinks. 

<A HREF-^yweb/horne/piorrpts.vml#proinptr' TONE-"l">Press 1 to hear 

Promptl</A> 

<A HREF-'*rnyweb/hoiric/prompts.vml^3rompt2" TONE«**2">Press 2 to hear 
Prompt2</A> 

<A HREF-"myweb/home/promptt.vml#prompt3" TONE-"3">Press 3 to hear 
PrOOTpt3<yA> 

Then, if the user presses **],#-, the browser will fetch the "mvweb/home/prompta.vmr 
HVML page, match "promptl" index with the first anchor's "promptl" label, and start 
presenting the prompts starting with text-to-speech translation of "Press CAL# foi 
Calendar". 
Browser Control 

<PAUSE TIMEOUT- " Seconds %" TERMINATE- "%tone%"> 
In order to let the voice page publisher to control the behavior of the voice browser, 
HVML defines a tag "Pause" with TIMEOUT" and "TERMINATE" attributes. When 
the browser encounters a PAUSE statement, it pauses until either the amount of time 
specified in the TIMEOUT attribute elapses or the user enters the tone specified in the 
"TERMINATE" attribute. If the values of the TIMEOUT attribute is 0, then the browser 
waits there indefinitely. The default value for TIMEOUT is 1 second. Default value for 
TERMINATE is "#". 
\bicc Responsive Forms 

HVML uses the FORM tag to enable user input similar to HTML including the 
METHOD attribute which specifies the way parameters arc passed to the server and the 
ACTION attribute which specifies the procedure to be invoked by the server to process 
the form. HVML extends the INPUT tag within forms by introducing VOICEINPUT tag. 
VOICEINPUT takes a TYPE attribute similar to the INPUT tag with three new values 
"voice", "tone" and "review" in addition to the existing "reset" and "submit" values. 
The HVML browser pauses at each VOICEINPUT statement in a HVML form until the 
specified input is supplied oi input is terminated before processing the remaining form. 
The VOICEINPUT tag with TYPE value set to "voice" indicates a form thai accepts 
voice inpuL Usually, a voice prompt or text-to-speech segment precedes the 
VOICEINPUT tag alerting the user thai input is required and how to terminate input The 
user is expected to speak and this message is recorded in real-time and supplied to the 
Voice Web server for processing. The VOICEINPUT tag containing "voice" value for the 
TYPE attribute also supports a MAXTIME attribute which specifies the maximum 
recording time for the message and a TERMINATE attribute which specifies the touch 
tone that terminates input. If the MAXTIME attribute is not specified, then the default 
value of "IS" is assumed. If TERMINATE attribute is not specified, then the default 
value of is assumed. For example, if the MAXTIME value is 20 and TERMINATE 
value is "df", then recording terminates when the user presses or 20 seconds of time 
elapses. 

The VOICEINPUT tag with TYPE value set to "tone" indicates a form that accepts touch 
tone input. Again, a voice prompt or a text-to-speech segment precedes the 
VOICEINPUT tag alerting the user for input. The user is expected to press a sequence of 
touch tones which are recorded and supplied to the \foice Web server for processing. The 
VOICEINPUT tag containing "tone" value for the TYPE attribute also supports a 
MAXDIGITS attribute which specifies the maximum number of touch tone digits that 
can be supplied and a TERMINATE attribute which specifies the touch lone that 
terminates input. If the MAXDIGITS attribute is not specified, then the default value of 
**20" is assumed. If TERMINATE attribute is not specified, then the default value of 
is assumed. For example, if the MAXDIGITS value is 10 and TERMINATE value is 
then input process terminates when the user presses "#** or 10 digits are supplied. 
The VOICEINPUT tag with TYPE value set to "review" indicates that the current values 
of the form can be reviewed by selecting the "review" input. The VOICEINPUT tag with 
TYPE value set to "reset" indicates that the current values of the form should be reset to 
their original defaults. The VOICEINPUT tag with TYPE value set to "submit" indicates 
that the curicnl form should be submitted to the server. Each of these three TYPE values 
support a SELECTTONES attribute and a SKTPTONES attribute. SELECTTONES 
attribute specifics the sequence of touch tones that activates the corresponding selection. 
SKIPTONES attribute specifies the sequence of touch tones that skips the selection. If the 
SELECTTONES attribute is not specified, then the default value of "*T is assumed and 
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if the SKIPTONES attribute is not specified, then the default value of is assumed. 
Far example, if the SELECTTONES attribute value a "REVIEW" and SKIPTONES 
attribute value is SKIP" foi a VOICEINPUT element with TYPE vahic set to "review", 
the user can eatei "REVIEW" to review the form values or enter "SKIP" to skip the 
selection. VOICEINPUT tag with TYPE value set to "submit" similarly indicates the 
values of the form can be submitted to the server. If the SELECTTONES attribuU value 
is "DONE" and the SKIPTONES attribute value is , the user can either enter 
"DONE" to submit ibe form or press """ to skip the selection. VOICEINPUT tag with 
TYPE value set to "reset" similarly indicates that the values of the form be reset to their 
original values. 

II. Voice Browser Commands 



All browser commands must start with the "*" key. Each browser command is associated 
with one or more key words that uniquely identify it. For example, in order to activate 
"Home" command, the user would press "*home" on the telephone Ley pad. The key 
words are chosen in such a way to ge aerate unique dial tone sequences. A set of default 
browser commands are listed below with the keyword and description of the command. 
Alternatively, the browser commands can also be issued by vocalizing the corresponding 
commands. For example, to activate the "Home" command, the user would say "home" 
on the telephone. 
Previous 

Jump to the previous page from which the current page was accessed via a hyper 
link. This command is activated by pressing "*pr" (*T7) or "'prev" (*7738) 
sequence. 
Next 

Jump to the next page in a sequence of hyper links. This command is activated by 

pressing "*n" ("fi) or "next" ('6398) sequence. 

History 

Present the titles of the pages accessed so far in the order of their hyper link 
access sequence. Pause after each title. If the user presses then jump to the 
page specified by the title. If not, proceed to the next title. This command is 
activated by pressing ""hi" ("44) or "*hist" (4478) sequence. 
Home 

Jump to the firs 1 page in the sequence of hyper links. This command is activated 

by pressing "*ho" ('46) or "•home" {"4663) sequence. 

Reload 

Reload the current page again from the Web server. This command is activated by 

pressing "*re" (*73) or ~*rclo" "(7356) sequence. 

Help 

Jump to the home page of the help page scL Help pages are navigated in exactly 
the same way as ordinary HVML pages. However, a new browser instance is, 
created on activation which must be "exited" to get back to the page context from 
which "Help" page set was accessed. This command is activated by pressing ""h" 
(*4) oi "'help" (*4357) sequence. 
Fax 

Jump to the home page of the Fax dialog session using HTML forms. Again, a 
new browser instance is created on activation which must be "exited" to get back 
to the page context from which "Fax" dialog session was activated. This 
command is activated by pressing "*fa" (*32) ~*fax" (*329) sequence. 
Stop 

Stop loading the page that is currently being accessed. This command is activated 

by pressing "*t" (*8) or "*stop" (*7$67) sequence. 

Exit 

Exit the current instance of the browser and return to the page being accessed in 
the previous instance of the browser. If this is the first instance of the browser, 
then exit the browser and hnng-up the phone. This command is activated by 
pressing "*x" ("9) or ""exit" (*3948) sequence. 
Bookmarks 

Present the titles of the pages selected as bookmarks in the order of their hypei 
link access sequence. Pause after each title. If the user presses then jump to 
the page specified by the title, [f not, proceed to the next title. This command is 
activated by pressing "*bo" {*26) or ""book" (*2<S65) sequence. 

HI. \bice Browser Playback Controls 

When the Vbicc browser is activated to play back voice prompts or speech segments, an 
additional set of browser commands are available to the user to control the playback. 
Pause 

Pause the play back at current position. This command is activated by pressing 
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"V* (* 7 ) 01 "*p»use" {'T2S1% 
Flay 

Continue play back from current position. This command is activated by pressing 

"p" {•?) oi "play" f/7529). 

Backup 

Bade up the play bade position by 5 seconds and sun play back. The command is 
activated by pressing "*b" (*2) or ""back" ('2225). Repeated pressing of the 
same tone implies successive back up by 5 seconds for each tone. 
Forward 

Forward the play back position by 5 seconds and start play back. The command is 
activated by pressing ***f ' (*3) or "firwd" (*3793). Repeated pressing of the same 
tone implies successive skip forward by 5 seconds for each tone. 
Start 

Back up the play back position to the beginning of the play back sequence and 

start play back. The command is activated by pressing "'0". 

End 

Jump to the end of the play back sequence, backup by 5 seconds and start play 
back. The command is activated by pressing 



What is claimed is: 

1. A method of delivering caller-customized voice-based 
information to a caller, comprising: 

storing caller-specific information in a computer file at a 
universal resource locator (URL): 
determining a URL associated with the caller; 
retrieving the caller-specific information using the 30 
URL; 

processing at least one caller command received over 
the telephone to determine a service request; 

retrieving information responsive to the service request 
and responsive to the caller-specific information, 3J 
including; 

generating a database query form responsive to the 

service request; 
customizing the database query form using the 

caller-specific information; and 
performing a database search using the query form, 40 
wherein generating a database query form respon- 
sive to the service request includes: 
storing a voice form associated with the service 
request at a universal resource locator (URL) 
address in the computer network wherein the *5 
voice form is stored in a markup language; 
playing the voice form to the caller to generate at 

least one information prompt for the caller; 
collecting information from the caller in response 

to each prompt; and 50 
generating a database query form using at least a 

portion of the collected information; and 
playing back the retrieved information to the 
caller over the telephone. 

2. The method of claim 1 wherein collecting information $s 
from the caller in response to each prompt includes collect- 
ing touch tone inputs from the caller. 

3. The method of claim 1 wherein collecting information 
from the caller in response to each prompt includes collect- 
ing voice command inputs from the caller and performing 
speech recognition on the voice command inputs. 60 

4. A method of processing voice-based information 
received from a telephone caller over a computer network, 
comprising: 

storing a voice form at a universal resource locator (URL) 
address in the computer network wherein the voice 65 
form is stored in a markup language with voice exten- 
sions; and 



during a calling session: 

playing the voice form to the caller to generate at least 

one information prompt to the caller; 
collecting information from the caller in response to 

each prompt; and 
storing the collected information in a first markup 

language document and including in the document a 

hyperlink to a second markup language document. 

5. The method of claim 4 wherein the hyperlink is 
determined responsive to at least a portion of the collected 
information. 

6. A method of processing voice-based information 
received from a telephone caller over a computer network, 
comprising; 

storing a voice form at a universal resource locator (URL) 
address in the computer network wherein the voice 
form is stored in a markup language with voice exten- 
sions; and 
during a calling session: 
playing the voice form to the caller to generate at least 

one information prompt for the caller; 
collecting information from the caller in response to 

each prompt; and 
storing the collected information in a first markup 

language document and including in a second 

markup language document a hyperlink to the first 

markup language document. 

7. The method of claim 6 wherein the hyperlink is 
determined responsive to at least a portion of the collected 
information. 

8. A system for delivering information over a telephone, 
comprising: 

a business white pages database including business name, 

address and phone number information, 
a database query form; 
a first processing agent programmed to: 
collect user information using a voice based telecom- 
munications device; 
include at least some of the collected information to the 

database query form; 
search the database by applying the database query 

form to the database to retrieve information; and 
generate a voice web page having a universal resource 
locator (URL) address using the retrieved informa- 
tion; 
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a yellow page database including business advertising 
information; and 

a second processing agent wherein the voice web page 
generated by the first processing agent includes a 
hyperlink to the second processing agent and wherein 
the second processing agent is programmed to: 
search the yellow page database to retrieve informa- 
tion; and 

generate a voice web page using the retrieved infor- 
mation; and 

a voice web browser adapted to play voice web pages 
to a user. 

9. The system of claim 8 wherein the hyperlink identifies 
an entry in the yellow page database and wherein searching 
the yellow page database comprises locating the yellow page 
database entry identified by the hyperlink. 
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10. The system of claim 8 further comprising: 

an order page database including business order informa- 
tion; and 

a third processing agent wherein the voice web page 
generated by the second processing agent includes a 
second hyperlink to the third processing agent and 
wherein the third processing agent is programmed to: 
search the order page database to retrieve information; 
and 

generate a voice web page using the retrieved infor- 
mation. 

11. The system of claim 10 wherein the second hyperlink 
identifies an entry in the order page database and wherein 
searching the order page database comprises locating the 
order page database entry identified by the hyperlink. 
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